-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Add asynchronous load method #10327
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add asynchronous load method #10327
Conversation
for more information, see https://pre-commit.ci
async def load_async(self, **kwargs) -> Self: | ||
# TODO refactor this to pull out the common chunked_data codepath | ||
|
||
# this blocks on chunked arrays but not on lazily indexed arrays |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI dask has async compute but it seems hard to work in here :)
xarray/tests/test_backends.py
Outdated
not has_zarr_v3, reason="zarr-python <3 did not support async loading" | ||
) | ||
@pytest.mark.asyncio | ||
async def test_load_async(self) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this parallels test_load
then lets keep this on the base class and use pytest.skip()
on the netCDF subclasses. That way it's easy to keep the two in sync. Let's add a comment requesting future contributors to keep the two in sync
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I did think about doing this but skipping an inherited method seemed messy.
I made it work in cf1d127, though. Note that the MRO becomes important to get the correctly overridden test to be inherited (so it can be skipped). I think the cleaner solution here would be if we could simply ask the backends whether or not they support async indexing, which is an idea we also discussed for #10579 (comment).
xarray/tests/test_variable.py
Outdated
@pytest.mark.asyncio | ||
async def test_lazy_async_indexing(self) -> None: | ||
v = Variable(dims=("x", "y"), data=LazilyIndexedArray(self.d)) | ||
await self.check_orthogonal_async_indexing(v) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine, but we could combine the sync and async checks in one async function and just use that everywhere in this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 4f40792, though now I'm a little worried that this pattern
async def check_orthogonal_indexing(self, v):
expected = self.d[[8, 3]][:, [2, 1]]
result = v.isel(x=[8, 3], y=[2, 1])
assert np.allclose(result, expected)
result = await v.isel(x=[8, 3], y=[2, 1]).load_async()
assert np.allclose(result, expected)
might be automatically passing the second assertion by still being in-memory after the first assert?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could parametrize the test over async and non async calls
@pytest.mark.parametrize("use_async", [True, False])
async def check_orthogonal_indexing(self, v, use_async):
expected = self.d[[8, 3]][:, [2, 1]]
if use_async:
result = await v.isel(x=[8, 3], y=[2, 1]).load_async()
else:
result = v.isel(x=[8, 3], y=[2, 1])
assert np.allclose(result, expected)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could use assert not v._in_memory
to be really sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went with the parametrization idea, though the syntax has to be a little messier because you can't parametrize normal functions in pytest. a074a25
@pytest.mark.parametrize("cls_name", ["Variable", "DataArray", "Dataset"]) | ||
@pytest.mark.parametrize( | ||
"indexer, method, target_zarr_class", | ||
[ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👏🏾 👏🏾 👏🏾
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing work!
I left some minor comments that should be easy to address.
Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>
Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>
…ata using zarr-python v2
Adds an
.async_load()
method toVariable
, which works by plumbing asyncget_duck_array
all the way down until it finally gets to the async methods zarr v3 exposes.Needs a lot of refactoring before it could be merged, but it works.
whats-new.rst
api.rst
API:
Variable.load_async
DataArray.load_async
Dataset.load_async
DataTree.load_async
load_dataset
?load_dataarray
?